Rocchio: Relevance Feedback in Learning Classification Algorithms
نویسنده
چکیده
Given a large amount of documents it is hard to find the documents that you need. These days most -if not allof these documents are available electronically. Information Retrieval (IR) systems help in finding the documents that satisfy the user’s information need. There are many techniques that are used by these IR systems. One of these techniques is learning classification. This technique uses preclassified training documents for classification of documents in the document base. Rocchio is one of these learning classification algorithms. In this article test data are used to compare Rocchio to two other algorithms. Usually Rocchio does not do very well compared to these other algorithms, but in this experiment Rocchio comes out as the best. When certain techniques are used to boost Rocchio, it can certainly compete with the other algorithms.
منابع مشابه
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also sug...
متن کاملUsing WordNet to Complement Training Information in Text Categorization
Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our a...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملQuery expansion and dimensionality reduction: Notions of optimality in Rocchio relevance feedback and latent semantic indexing
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method’s basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in ...
متن کاملA Relevance Feedback Method for Discovering User Profiles from Text
The huge amounts of data on the Internet often make difficult the user’s search for relevant information. For this reason, systems that are able to support users in this task could be a valuable help in this activity. Unfortunately, being able to catch user interests and represent them in a structured form is in general a problematic activity. Our research deals with the application of supervis...
متن کامل